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ABSTRACT 

We present an analysis of a 50 orbit HST ACS observation of the M87 globular 
cluster system. We use the extraordinary depth of this dataset to test whether 
the colors and magnitudes show evidence for a mass-metallicity relation in globu- 
lar cluster populations. We find only a weak or absent relation between the colors 
and magnitudes of the metal poor subpopulation of globular clusters. The weak- 
ness or absence of a color-magnitude relation is established over a wide range 
in luminosity from My = —11 to My = —6, encompassing most of the M87 
globular clusters. The constancy of the colors of the metal-poor subpopulation 
seen in our 50 orbit observation is in contrast to suggestions from single orbit 
ACS data that the metal-poor globular clusters in M87 and several other galaxies 
show a "blue tilt." The formal best fit for the mass-metallicity relation for the 
metal-poor subpopulation in our much deeper data is Z (x mo o8±o o5 q^j. anal- 
ysis of these data also shows a possible small "red tilt" in the metal-rich globular 
cluster subpopulation. While either of these small tilts may be real, they may 
also illustrate the limit to which mass-metallicity relations can be determined, 
even in such extraordinarily deep data. We specifically test for a wide range of 
systematic effects and find that while small tilts cannot be confirmed or rejected, 
the data place a strong upper limit to any tilt of |0.20| ± 0.05. This upper limit 
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is much smaller than some earlier claims from single orbit data, and strongly 
limits self-enrichment within globular clusters. This mass-metallicity relation for 
globular clusters is also shallower than the relation for galaxies, suggesting that 
the formation mechanisms for these two types of objects are different. 

Subject headings: galaxies: star clusters - galaxies: individual (M87) - globular 
clusters: general 



Introduction 



The colo r distribution of globular clusters in most gal axies has long been known to be 



bimodal (e.g. Kundu &: Whitmore 2001 : Larsen et ahlboOll ). The standard understanding of 



this feature is that the red and blue subpopulations for a given galaxy represent a difference 
in metallicity, with the blue clusters being on average more metal poor than the red clus- 
ters. The correspondence of the blue clusters with lower metallicity and red clusters with 
higher metallicity follows directly fr om studies of the Milky Way globular cluster system (e.g. 
Ashman &: Zepilll998l : iHarrid Il991l). and has b e en confirmed in a number of extragalac tic 
globular cluster systems (e.g. ICohen et al.ll2003l : IStrader et al.l 120071 : iKundu fc Zepj 120071 ). 



Recently, several studies of the globular cluster systems of early type galaxies have 
suggested a relation between the color and luminos i ty of the blue metal-poor clusters, with 



brigh ter clusters appearing more red (IHarris et al.l l2006l : IStrader et al.l l2006l : iMieske et al. 
20061 ). This "blue tilt" has been taken as evidence of a relation between the mass and 



metallicity of these clusters. If such a trend w ere real, it would sugges t an important role for 
self-enrichment in luminous globular clusters (IStrader &: Smithll2008l ). It might also suggest 
that globular clusters are surrounded by more massive halos when they form, in order to 
allow them to retain metals from multiple generations of stars. Alternatively, the absence 
of a mass-metallicity relation would strongly limit the degree to which self-enrichment is a 
factor in the observed cluster metallicities. Such an absence also emphasizes the differences 
between the formation histories of globular clusters and galaxies, which are known to have 
a strong mass-metallicity relation. 

Firmly establishing any such relation requires determining the color and luminosities of 
the globular clusters over a large range in brightness to detect any effect over a significant 
luminosity and mass scale. Unfortunately, many of the results that show evidence of this 
blue tilt do so using fairly shallow single orbit ACS data, with comparably low signal to 
noise for the remaining more distant galaxies. In this paper, we present the results of an 
extraordinarily deep 50 orbit ACS study of the central region of M87. This deep data allows 



- 3 - 



clusters to be detected over the entire range of cluster magnitudes, providing a sample that 
is effectively complete. No trend between the mass and metallicity is observed for the blue 
globular cluster population, suggesting that the previously published results are influenced 
strongly by the much lower signal to noise in that data. Section [2] discusses the data and the 
photometric reduction used to ensure accurate cluster measurements. Section [3] reviews the 
KMM algorithm used to determine the peaks of the color distributions, and the sensitivity 
of the results to changes in the binning methods. Finally, section H] presents the results for 
our sample of clusters, and section \5\ discusses these results in the context of the formation 
history of the globular cluster system. 



2. Observations and Reductions 

The data for this project come from a 50 orbit observation program with the Advanced 
Camera for Surveys (ACS) aboard HST (PI: Baltz, Proposal ID: 10543). The images are of 
the central region of the giant e lliptical galaxy M87 , extending out to a projected radius of 
8 kpc (taking m — M = 31.021 iMacri et al. ] Jl999h l The data were taken over the course 



of three months, as part of a search for microlensing events, which require repeat visits to 
find changes in brightness. These multiple images of the same field yield data that can be 
combined into very deep exposures. On each observing day, four exposures in F814W were 
taken with dithered pointing along with a single matching exposure in F606W. The four 
F814W images provide a fully dithered image for each day, with the F606W yielding full 
coverage over the entire set of observations. The images were combined using Multidrizzle 



(IFruchter fc Hookl 120021 ) to a resolution of 0'.'05 pixel"^, the nominal resolution of ACS. 
Although the dither pattern for this dataset is more than sufficient to allow higher resolution 
final images to be constructed, this is not necessary for the identification and photometry of 
the globular clusters. At the resolution used, the globular clusters are significantly broader 
than the ACS PSF, so retaining the nominal ACS resolution provides the highest possible 
signal to noise. In all, 49 F606W and 205 F814W images were combined to yield final images 
with exposure times of ty = 24500 s and tj = 73800 s, making these some of the deepest 
images ever taken with HST. In addition to the exposures used, 8 F606W and 13 F814W 
were taken but excluded from analysis due to a loss in the guide star tracking. 

The final drizzled image contains the strongly varying galaxy light. As the main source 
of noise in the final image is due to the variations in this galaxy light from pixel to pixel, 
constructing an accurate model of the galaxy is essential to estimating the detection efficiency 
across the image. We use a model of the galaxy determined from isophote fitting, but to 
optimize this fit, it is advantageous to remove sources other than the galaxy light. In addition 
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to the rich globular cluster population, these sources include background galaxies and the 
optically bright jet emanating from the galaxy core. In order to identify these sources, we 
first create a model of the galaxy by median filtering the image, with a box size chosen to be 
larger than the expected size of any features on the image (taken to be 100 x 100 pixels for 
this resolution). This galaxy model does a poor job near the galaxy center, where the filtering 
box is unable to deal with the steep light profile. In addition, bright globular clusters tend 
to bias the model, leading to oversubtraction around such objects. Despite these issues, the 
model subtracted data image is significantly smoother than the initial image. This image is 
then used to construct a mask to remove the effect of the globular clusters and jet. All pixels 
that are 3a above the local mean (calculated again in a 100 x 100 pixel box) are added to the 
mask image. This method does an excellent job of cleaning the image of the contribution 
from sources other than the galaxy light. The STSDAS ELLIPSE and BMODEL tasks 
were then used to create a smooth model of the galaxy, based on the original and mask 
images. The mask ensures that bright clusters do not bias the ELLIPSE fitting, and that 
the galaxy core profile is not influenced by the jet. ELLIPSE only runs for annuli which 
are more that 50% complete (not masked or off the frame). To rectify this, additional C 
code was written to calculate a continuation of the ELLIPSE model into the image corners, 
under the assumption that the position angle and ellipticity are fixed at the last value found 
by ELLIPSE. This final model was then subtracted from the original drizzled image, along 
with a constant background level, calculated to be the mode of the galaxy subtracted image 
histogram. 



Source Extractor (IBertin fc Arnoutslll996l ) was used to generate a database of objects 



detected on the images. To ensure that the detection efficiency was similar across the image, 
the light from the galaxy was subtracted and used to weight the detection process. As the 
noise on the final data image is highly dependent on the galaxy flux, a constant detection 
threshold would miss many objects near the center of the galaxy. This weighting scales the 
detection threshold to more accurately reflect the local image noise. A detection threshold of 
3a was used for this project, with a minimum area for objects of 2 pixels. This area criterion 
helps limit small noise spikes from being detected as potential clusters. To resolve this issue 
further, we also set the requirement that an object must be detected on both the F606W 
and F814W images to be considered for further analysis. 

All objects that were detected in our images were measured in ten apertures from r = 1 
pixel to r = 10 pixels. As the clusters are resolved in our images, the differences in the sizes 
of clusters of equal brightness can greatly change the light detected within a fixed aperture. 
Therefore, a size dependent aperture correction was constructed to accurately measure the 
total cluster flux. To calibrate s uch a correct ion, we created simulated globular cluster images 



by convolving two dimensional iKingI (119661 ) models with the instrumental PSF, calculated 
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using the empirical ACS PSFs of lAnderson fc King (120061 ) . These simulated clusters were 



generated with a fixed concentration c = 1.03 over a grid of apparent instrumental magni- 
tudes and tidal radii. At each grid point, 200 simulated clusters were randomly added to the 
galaxy subtracted images, and then processed through the same detection and measurement 
process as for the data images. 

Based on this large sample of simulated clusters, we adopted a radius of 4 pixels to 
measure the cluster light, and then determine an aperture correction from this radius to 
infinity. This correction was parameterized by an estimate of the size of the object, defined 
to be the difference in the measured magnitudes within two radii: 71 = m^pxi — m2pxi- The 
logic behind this parameter is that a very small object (with a radius smaller than 2 pixels) 
will have approximately the same magnitude in both apertures, yielding a value of 7?. ~ 0. 
As an object increases in size, more light is measured in the large aperture compared to the 
smaller, which pushes TZ to more negative values. 

The grid of simulated clusters allows for the easy construction of an aperture correction. 
For each grid point in {m,rt) simulated, the median m^pixei and IZ are found among the set of 
simulated clusters detected by Source Extractor. These values are used as samples of the 
aperture correction: 

wA(7?., TTL^pixel) "^simulated "^ipixel (1) 

The upper end of this correction is fixed, such that A^Q^m^pixei) = 0. For each real object 
detected in the galaxy subtracted data image, the correct aperture correction is found by 
interpolating on the grid of {TZ^m^pixei)- This aperture correction is mainly a function of 
7?., with very little dependence on the cluster magnitude. This is exactly what is expected, 
as any magnitude dependence would suggest that the photometry is not uniform. The 
median aperture corrections in our two filters are —0.61 and —0.71 for F606W and F814W 
respectively, and vary directly with the size of the cluster. 

The accuracy of these aperture corrections can be checked for the data clusters by 
fitting the images with PSF convolved King models (jKing 1966 ). Each of the clusters 



m 



this sample had such King models fit to their F814W image, which has higher signal to 
noise t han the F606W dat a. The exact details of this procedure will be presented in a future 



paper (IWaters et al.ll2008l ). The aperture correction correlates well with the best fitting tidal 



radius of the clusters 



Afgoqw = -0.60 - 0.0013 x n 
Afsuw = -0.70 - 0.0014 X n 



which illustrates the importance of including these size effects. The magnitudes of the 
best fitting King models provide a more detailed measurement of a given cluster's total 
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flux. Over the range of magnitudes wliere tliis fitting is able to faithfully fit the cluster 
images (/ < 25), the aperture magnitudes and the King model magnitudes agree well, with 
'^aperture — i^King = —0.02 ± 0.12. This Scatter is consistent with our expected photometric 
uncertainties, and illustrates that this aperture correction strategy is a robust way to measure 
the light from the clusters with a wide range of sizes. 

The PSFs used in the ge neration of the simulated clusters extend out to O'.'S from their 
core (lAnderson fc King] 120061 ) . The ACS detector is known to have large angle scattering 
beyond this limit, which will not be accounted for, even with this radius dependent aperture 
correction. Because of this scattering, additional —0.088 and —0.087 magnitude corrections 
were apphed to the F606W and F814W instru mental magnitudes, respectively, to correct 
them to an infinite aperture (ISirianni et al. To compare this data to previous studies of 

globular clusters, we converted the instrumental F606W and F814W m agnitudes to standar d 
V and I. This was done using the color correction formulas presented by lSirianni et al.l (120051 ). 
The final magnitudes were t hen corrected for ext inction (—0.074 and —0.043 for V and I), 
using the extinction maps of ISchlegel et al.l (119981 ). 



This ACS pointing c ontain s the majority of the clusters imaged by the iKundu et al. 



(119991 ) and IWaters et al.l (120061 ) WFPC/2 studies, allowing us to test our photome tric con- 
sistency by comparing to those previously published. For the iKundu et al.l (119991 ) sample, 
there are 968 clusters matched between the two samples with magnitude differences 



V-VKundu = -0.048 ±0.127 

I-lKundu = 0.030 ±0.115 

{y -i)-{y -i)Kundu = -0.022 ±0.116 

This comparison uses a reanalysis of the iKundu et al. I Jl999h s ample that incorporates the 
most recent official WFPC/2 zeropoints. For the IWaters et al.l (120061 ) sample, there are 
clusters in common, with differences 



V — Vwaters 
I ^Waters 
{V-I)-{V- I)waiers 



0.073 ±0.098 
0.082 ±0.084 
-0.000 ±0.118 



The original published IWaters et al.l (120061 ) measurements included a 0.1 magnitude offset 
in the final V magnitudes due to an inadequate PSF used in the aperture correction. The 
differences presented here correct this error, and shows that this sample is generally consistent 
with these new measurements. For both of these samples, all bright clusters that fall within 
the ACS field of view are detected. 
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Although this data is extraordinarily deep, and the detection threshold has been set 
at a fairly liberal level, we still expect that we are not 100% efficient at detecting clusters. 
Conveniently, the simulated clusters used to construct the aperture correction also sample 
the completeness of objects in both filters. The fraction of clusters detected at each grid 
point in apparent instrumental magnitude and tidal radius was used to generate the expected 
completeness of the data objects. Although the detectability of an object is a function of 
the object's surface brightness, there is little dependence on the simulated cluster size, so 
for all further analysis of the completeness, we consider it only as a function of the apparent 
magnitude. As expected from our simple noise model of the galaxy, the detection efficiency, 
and hence the completeness, is a function of projected radius from the center of the galaxy. 
Due to the scaling of the detection threshold, clusters with low Rgai must have a larger flux 
peak to be detected at the same threshold as a cluster with larger Rgai- To minimize the 
effect this has on our calculated completeness values, we divide the total sample into two 
radius bins, separated at the median cluster radius, 68'.'95. The completeness is calculated 
independently for the two bins, which allows the completeness in the outer bin to extend to 
slightly fainter levels. Due to the requirement that objects be found on both the F606W and 
the F814W images, the final completeness levels are color dependent. For the range of colors 
we consider for our globular cluster sample, the median 50% completeness limit follows the 
trend V5o% ^ 25.4 + 0.8 — /). For the following analysis of the cluster bimodality, we use 
a stricter I band 90% limit, which follows ~ 24.5 — 0.1(\^ — /). 

The cluster detection method used was designed to detect as many objects as possible. 
Unfortunately, this also means that some fraction of the objects found are not true globular 

clusters. The two main sources of contaminating objects are background galaxies and noise 
fiuctuations from the galaxy light due to random local overdensitics in the stars that make 
up the galaxy. We have chosen to make a series of cuts in the measured parameters that 
serve to exclude as many of these contaminating objects as possible, while retaining the real 
globular clusters. In addition to the requirement that objects be detected in both filters, we 
also apply a color cut such that only objects with 0.5 < V — I < 1.7 are considered to be 
globular clusters. These limits are based on the observed colors of globular cluster systems 
and bracket the obvious feature in the color magnitude diagram of the sample of all detected 
objects. Next, we only wish to consider objects that have a final completeness value greater 
than 50%. Due to the fact that this completeness limit is so faint, we expect that only a 
few real clusters will be excluded by this cut. Since globular clusters are known to be round, 
we can place a cut on the ellipticity of the clusters in both filters, such that e\y ^ < |. This 
cut serves as an excellent way to exclude the many background spiral galaxies that are seen 
edge on. Finally, we exclude objects where the average surface brightness is brighter than 
the peak surface brightness. Visual inspection of the objects excluded by this cut show them 
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to be entirely high spikes in the galaxy noise. 

As a check of how well these cuts function to clean our catalog of objects that are not 
globular clusters, we can apply these same cuts to catalogs of objects that contain no globular 
clusters, and only contaminating objects. If the variations in the galaxy noise were purely 
Gaussian, then the number of expected noise maxima on the data image should match the 
number of minima. By multiplying the galaxy subtracted image by —1, and searching for 
objects on that image, we should be able to estimate the noise contribution. Unfortunately, 
the asymmetry of the image histograms for these images suggest that the fluctuations may 
not be Gaussian distributed, but are rather biased towards the positive fluctuations, as might 
be expected from the incipient detection of the brightest individual stars in M87. 

The contamination by background galaxies obviously contributes to the number of ob- 
jects found. A quick look at the galaxy subtracted image shows a large number of background 



spiral galaxies that are easy to identify. The Hubble Ultra Deep Field ( iBeckwith et al.ll2006l ) 
provides a dataset containing only background galaxies, and as such, gives a straightforward 
way to estimate the contribution we expect to find from these galaxies. The only complica- 
tion is ensuring that the data quality matches between the two datasets. We first rebinned 
the UDF images from a pixel scale of (/.'OSO to the pixel scale of our data ((/.'OS). Next, the 
UDF was overlaid on our image footprint, and trimmed to match the field of view. With the 
geometry matched, we considered the image noise. The UDF count rate was scaled to match 
the exposure time of the M87 data. The M87 galaxy model was then used to generate a noise 
model, which assumes that each pixel is drawn from a Gaussian distribution with mean zero 
and variance equal to the galaxy model value. This noise was then scaled such that when 
added to the UDF image, the final image variance equaled that of the galaxy subtracted 
image. This method makes it certain that objects in the UDF frame that fall "near the 
center of the galaxy" have more noise than objects "at the edge." The reduction for objects 
detected in the UDF images is identical to the M87 images, with the only exception being 
that the UDF filters are F606W and F775W. Since these filters can be converted to the same 
standard V and I system, the effect of this difference is minimal. 

The numbers of objects detected with each of the various cuts in the M87 data, the 
inverse images, and the UDF sample are given in table [TJ It is clear that the various 
restrictions on the data remove the majority of the contaminating objects, and yield a final 
number of clusters consistent with what is expected based on earlier surveys. The one object 
that is manually excluded from the sample is the HST-1 knot of the jet, which due to its 
brightness and compact size, successfully eludes these cuts. 
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3. Cluster Bimodality 



The color distribution of our extremely deep sample of globular clusters is shown in 
figure [H This distribution is clearly bimodal, as has been observed before for M87 and 
appears to be a commo n feature of globular cluster populations (e.g. iKundu fc Whitmore 



2OOII : iLarsen et al.ll200ll). The degre e of bim odality is generally quantified using the KMM 
test, as presented by I Ashman et al.l (119941 ). Briefly, this test determines the best fitting 
mixture of Gaussian distributions to a set of data points. For each Gaussian component 
included in the mixture model, the best fitting mean and variance are found, along with 
the number of points that belong to each component. This algorithm works quickly, using 
a maximum likelihood method to estimate the best fitting parameters based on only simple 
initial values. In addition, as the data are not binned in any way before the analysis, there 
is no uncertainty due to the choice of bin size. Closely separated groups, such as from a 
sample that does not appe ar bimodal visually, can often be disentangled given a sufficient 
number of data points (see lAshman et al.lll994l for details). 



The KMM test can fit any number of Gaussian components, under the assumption that 
all variances are the same (homoscedasticity) or that the variances are possibly different 
(heteroscedasticity). For our analysis of the M87 globular cluster system, we consider a 
maximum of two components, based on the obvious visual bimodality. However, we compare 
against a single component model to check the possibility that the data is better fit by this 
more simple model. This check is done using the likelihood ratio test which is defined as 

simple 

^ = mxUmple. 

This provides a quantity between zero and one that tells how likely the simpler model fits the 
data better. By comparing —2 In A to a distribution with a number of degrees of freedom 
equal to twice the difference in the number of free parameters between the two models, we 
can determine the point at which the more complicated model is a sufficient improvement 
over the simple model to justify the additional free parameters. Therefore, using this test, 
we can check that our data is truly fit better by the two component model by checking that 
the p-value for the unimodal model is small. 

One limitation of the KMM method is that outlier s in the distribution can significantly 
bias the results, especially for the heteroscedastic fits (lAshman et al.lll994l ). These models 
are biased the most as they can adjust the individual dispersions to larger values that better 
accommodate the outliers. This in turn will tend to increase the separation of the means of 
the two modes. As the dispersions of the homoscedastic models are coupled, the weight of 
an individual outlier is significantly decreased, so many more outliers are needed to create 
such an effect. 
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Given that heteroscedastic fits are more influenced by outliers, it is preferred to fit any 
population that does not obviously have different dispersions with homoscedastic models. 
The KMM test does not clearly state whether the input data is better fit by a homo- 
or heteroscedastic model. Bootstrapping the sample can provide an answer, as the value 
of —2 In A (as given above, using the likelihood of the homoscedastic and heteroscedastic 
models for the simple and complex cases, respective ly) calcu lated for each bootstrapped 



sample approximates the underlying distribution (jLdl2008l ). Therefore, the fraction of 
bootstrap samples with —2 In A more extreme than that calculated from the observed data 
is an estimate of the p-value that the data is truly better fit by a heteroscedastic model. 

To determine whether our data are better described by a homo- or heteroscedastic 
model, we fit the cluster sample using both models and bootstrapped as described above to 
estimate PheteroscedasUc- The sample used in this fit and subsequent analysis is restricted to 
those clusters with color between 0.7 < V — / < 1.5 to limit the effect of the largest outliers. 
This sample of clusters was then fit using both models and bootstrapped as described above 
to determine the probabihty that the data is more accurately described by a heteroscedastic 
model. The resulting dispersion for the homoscedastic model is c homoscedastic = 0.105 and for 
the heteroscedastic model auue = 0.126 and ared = 0.090. The similarity of these dispersions 
is supported by the our statistical test, which gives p heteroscedastic = 0.03, excluding the 
heteroscedastic fit for the entire sample at the 95% confidence level. This does not mean that 
the underlying color dispersions of the red and blue subpopulations are necessarily identical, 
only that there is no statistical support for fitting the distributions with different dispersions. 
Therefore, the remainder of this paper focuses on the results for the homoscedastic models, 
which we again note are much more robust against the influence of outliers. 

As we wish to investigate any trends in bimodality wit h cluster mass, we rieed to divide 



the c lusters into luminosity bins. Following previous studies (jStrader et al.ll2006l : lHarris et al. 



20061 ). we use the I magnitude as a tracer of cluster mass. Two types of magnitude bins were 
used for this study. The first method uses fixed width bins from / = 19.5 to / = 24.5, using 
bin widths of 0.5 and 1.0 magnitudes. The mean value of I is calculated for each bin, which 
due to the shape of the globular cluster luminosity function, does not fall exactly at the 
center of the bin range. Unfortunately, the KMM test as originally stated does not directly 
yield uncertainties for the parameters calculated. Since the algorithm conv erges quickly. 



it is reasonable to estimate these uncertainties by bootstrapping the sample (IBasford et al 



19971 ). Therefore, the best fitting means, variances, and population ratios were calculated 
with their associated uncertainties from 100 bootstrapped samples in each magnitude bin. 
These values are listed in |2] and [21 with the results for the 0.5 magnitude bins plotted in 
figure [21 
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The second binning method uses running samples of 100 clusters. The clusters are sorted 
in I magnitude, and the KMM test run on the first 100 points. For the next bin, the brightest 
cluster in the sample is removed and the next cluster from the sorted list of magnitudes is 
included, and the process repeated. As these bins contain on average fewer clusters than any 
of the fixed width bins, the number of outlying points in any of the running bins should also 
be small. The effect of these outliers can be somewhat mitigated by bootstrapping, as not 
all samples will retain the discrepant points. However, taking the average values from these 
bootstrap results allows the outlying points to continue to influence the results. One solution 



to rem ove this influence is to use the bootstrap "bumping" procedure of lTibshirani &: Knight 



(119991 ). For each set of bootstrapped results, the fit with the largest likelihood is retained 
and the rest discarded. This makes the fitting resistant to outliers, as the small probability 
of finding such a point given the model ensures that any sample that contains the outlying 
point will naturally have a much lower likelihood than a sample that has excluded that point. 
Each bin was bootstrapped in this way with 20 samples, which should al low the fitting to 



be resistant to the influence of up to 5 outliers (ITibshirani fc Knightlll999l ). 



The best fitting models for the fixed width bins are shown in tables [2] and [3l We can see 
that based on these models, the unimodal description of the cluster colors is strongly ruled 
out over most of the range in magnitude considered. The only deviations from this trend 
occur at the very brightest and faintest bins. These devi ations are not surpris ing, as these 



bins have a smaller number of clusters. Simulations by I Ashman et al.l (119941 ) showed for 
cases like these, with a small number of objects {N ~ 50) and a modest ratio of component 
mean separation to component variance {Afi = ~ 2), there is a high probability of a 

truly bimodal distribution being unrecognized by the KKM algorithm 

Based on these fits, we can see that as we look at bins containing fainter clusters, the 
variances in the best fitting models increase. This is true for both subpopulations, and 
shows the effect the decrease in signal to noise has on the scatter in the measured colors. 
The population fractions for the red and blue fits show that as the bins move to fainter 
magnitudes, the blue population fraction falls. This is a result of the differences in the 
luminosity functions of the red and blue clusters and will be discussed in a future paper on 
the globular cluster luminosity function of this very deep data (Waters et al. 2008). 



4. Color-Metallicity and Mass-Metallicity Relations 

We use the means determined by KMM for the red and blue clusters as a function of I 
magnitude to investigate the existence of any color-magnitude trends in both populations. 
For each binning method, the best fitting trend is calculated for both the red and blue 
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subpopulations. For the fixed width bins that have errors from bootstrapping, these errors 
are used to weight the fits. This helps hmit the influence from the brightest and faintest 
bins, which tend to have large errors due to the low number of clusters. The running bins 
constrain the fit with equal weight, as the large number of these bins reduces the relative 
influence of any individual bin. 

These fits yield the best trend {V — I) = a + b ■ I , which ca n be converte d to a mass 



metallicity relation of the form [Fe/H] = k + a log^g w^- We follow iHarrid (119961 ) in creating 
a relation 

[Fe/H] = 5.2267(F - /) - 6.2613 (3) 

based on observations of Milky Way globular clusters, and use a constant mass to light ratio 
of M/ L = 3 for all clusters. Table H] shows the best fitting values for these trends for each of 
the binning methods. It is clear that for the blue subpopulation, all three binning methods 
yield similar best fits. To determine our formal best fit models, we average the three binning 
methods, and take the scatter in these values as our expected uncertainty in the fit due to 
the binning. Over the full range of clusters, we find an average best fit for the blue clusters 
oi Z (X ]Vf°-°^=''°-°^ . To offer a more direct comparison with the tilts measured in previous 
studies, we have also fit only those clusters brighter than the turnover (/ ~ 22.5). These fits 
are given in table [51 We find a similarly small Z oc mo oi±o o7 relation with these limits, and 
as they are only marginally different than the fits over the full magnitude range, we do not 
plot these trends separately. 

It is essential to characterize how well the formal uncertainties in the fit reflect the true 
uncertainty in the slope of the mass-metallicity relation. In order to test the accuracy of 
the calculated fits, we constructed simulated color-magnitude diagrams with a known tilt 
in the blue clusters. Each cluster in our data sample was checked against the 0.5 mag- 
nitude homoscedastic fixed width bins to determine if it was a likely member of the blue 
subpopulation. All of these blue clusters then had their color shifted, such that 

{V — I)new = iy — I)old + {O'trend — O^blue) + H^trend — buue) (4) 

where a^^e and buue are the best fitting trend in the blue clusters, and atrend and btrend define 
the trend we are adding. This method therefore preserves the cluster I magnitude, as well 
as the distribution of points around the best fit trend. These simulated color magnitude 
diagrams are then run through the KMM test, and the best calculated trend found. By 
comparing the difference in the input and output slope, we can estimate how large we can 
expect the errors to be in our mass-metallicity fits. 

Two simulated trends were used for this test, representing both extremes in the blue 
tilt. First, a strong tilt of Z oc M°'^^ like that claimed in some analyses of shallower data was 
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added, the results of which are shown in figure [31 The best fitting trend for this simulation 
is Z oc M^'^^^^'^^ {Z oc mO-44±o.05 including only those clusters brighter than the turnover), 
suggesting that the small formal errors calculated are not significant underestimates. As we 
recover this large input trend with high accuracy, it is clear that were such a trend truly 
present in our data, we would have no difficulty in detecting it. The main reason that such 
a large trend is so easy to recover is that it must have a large separation between the means 
of the red and blue subpopulations at faint magnitudes. This separation is much larger 
than the color dispersion of the clusters in our sample, which creates a gap between the two 
groups. Such a gap allows the KMM test to easily group the clusters, providing an excellent 
fit. 

The creation of a large gap is not an issue for the second simulation, in which the slight 
blue trend is completely removed, with a simulated trend of Z oc M^'^^. As there is only 
a slight shift in the cluster colors, the degree of blending between the red and blue clusters 
should be nearly identical to the real cluster data. The calculated trend for this simulation 
is Z oc ]\^-o-06±o.o6 ^2 oc ]Vf''-°''='=°-°^ including only those clusters brighter than the turnover). 
These tests on simulated data indicate that our fits to the color magnitude trends in the 
globular cluster subpopulations accurately refiect the underlying trend in the data with small 
well- understood uncertainties. The specific fitting uncertainties in the fit to the slope have 
a maximum of 0.06, which occurs for small real slopes. This uncertainty decreases to even 
smaller values when the input slope is large. However, our method does not directly address 
any possible systematic issues in the measurement of the colors and magnitudes. While 
our measurements are based on a careful analysis of this extraordinarily deep data, it is 
important to consider all possible systematic effects in the measurement. 

Our tests with artificial datasets indicate that we accurately recover the simulated slopes 
with a maximum uncertainty of 0.06. These tests provide an excellent way to assess the un- 
certainties in the fits to the data. However, this approach does not directly address potential 
systematic biases in the measurements of the colors and magnitudes. The most likely source 
of any systematic error in these is due to the aperture corrections. The extraordinary depth 
of our data allows a much more accurate determination of the sizes of the globular clusters, 
which directly leads to a much more accurate determination of the total magnitude of the 
globular clusters. These sizes are infiuenced by the choice of PSF used, and we simulated 
the effects of different PSFs on the resulting aperture corrections. Our simulations show 
that in this well sampled data, in which the globular clusters are spatially resolved, large 
differences in the mag nitudes (of up to 0. 1 ma gnitude) only occurred with PSFs that were 



inconsistent with our [Anderson fc King] (j2006|) PSFs and were not good fits to the single 



unsaturated star in the image. Using such poor PSFs can create tilts of up to ±0.2 in the 
final mass metallicity relation for both subpopulations on top of the modest tilts seen in 
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our fitting. This extreme allows us to note that any possible systematic effect due to the 
aperture correction must be smaller than this level, and that both of the slopes found in our 
data fall within this range of a < |0.2| ± 0.06. 



Discussion 



Our 50 orbit ACS observation of M87 shows no significant relation between the colors 
of the blue metal poor clusters and their luminosity, with a formal best fit of Z oc M^"*^ 
from 19.5 < I < 24.5, and a conservative upper limit on the slope of any relation in the M87 
globular cluster population of a < |0.20| ± 0.06 including systematic effects. A similarly 
small trend is found when the fitting is restricted to only the bright clusters above the 
luminosity function turnover (19.5 < I < 22.5). This absence of any significant mass- 
metallicity relation rules out some earlier claims of such a trend from much shallower data. 
Some earlier work investigated the mass metallicity relation of metal poor globular cluster 
populations in much shallower images of nea rly ellipticals, including single orbit d ata for 
M87 and several other bright Virgo ellipticals (jStrader et al.ll2006l : iMieske et al.ll2006l ). single 
orbit pointings of NGC 4594 ( Spitler et al. 20061 ). and several distant elliptical s with longe r 



expo sures and thus similar signal to noise as the nearby single orbit observations (IHarris et al. 



20061 ). These shallower studies suggested that the metal poor globular cluster populations of 
some of these galaxies, including M87, had a blue tilt and inferred a mass metallicity relation 
of about Z oc M°'^^. These results are clearly not confirmed in our 50 orbit data, which 
places an upper limit on any mass-metallicity relation that is much smaller than this strong 
trend. There are also several ground based studies of the color-magnitude trends in globular 



cluster systerns tha t find a variety of re sults, including bot h blue tilts JPorte etal 



Wehner et all 120081 1 as well as red tilts flBassino et all l2008l : iLee et all 120081 ). iForte et al. 



2007 



(120071 ) examined M87, and found evidence for mass-metallicity relation of Z oc M°'^^, which 
although smaller than the claims from space-based studies, is still inconsistent with the lack 
of a tilt in our much deeper data. This emphasizes the difficulty in accurately determining 
color tilts from data with low signal to noise and poor spatial resolution. 

The clear absence of a significant mass metallicity effect in our data contrary to the very 
strong effect in single orbit data for galaxies like M87 and in data with similar signal to noise 
at larger dist ance, highlights the need for very deep observations to address this question. 
Kundul (120081 ) has invest igated this qu estion in detail. We do not reproduce this extensive 
work here, but note that iKundul (120081 ) identifies two major issues that arise in single orbit 
data or data with similar signal to noise for the bulk of the clusters. First, such data lacks 
the depth necessary to accurately follow the sizes of globular clusters with magnitude. This 
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lack of size discrimination will cause size dependent photometric errors. Secondly, the error 
bars for detections are not symmetric in color and magnitude. These two effects are shown 
to be able to produce apparent color magnitude trends of the level seen in the l ower signal 
to noise data from an underlying distribution with no trend at all (iKundd l2008l ) . 



The absence of a significant mass-metallicity relation for globular clusters also suggests a 
fundamental difference between globular clusters and ga laxies, as galaxies have a well-known 
mass metallicity relation. Specifically, using SDSS data, iTremonti et al.l (120041 ) found a mass 
metallicity relation of Z oc M^'^ fo r a very large sa mple of galaxies. This was extended to 
nearby dwarf irregular galaxies by iLee et al.l (120061 ) . who found a mass-metallicity relation 
consistent with that for the more massive galaxies. This dwarf galaxy sample extends down 
to M ~ 10^ — 10^, the range where the most massive globular clusters are found, and thus is 
suggestive of a fundamental difference between globular clusters and galaxies in their mass- 
metallicity relations. Such a difference likely reflects differences in the formation histories 
of galaxies and globular clusters. If globular clusters experience significant self-enrichment, 
then the more massive clusters will appear more metal-rich, since their greater mass will 
enable them to retain more metals from earlier generations of stars. Any self-enrichment of 
metals that affect the broad-band colors will make the more massive clusters redder, and 
produce color-luminosity and mass-metallicity relations within a globular cluster system. 
Therefore, the weakness or absence of the observed color-luminosity and mass-metallicity 
relations for globular clusters thus sets a limit on the role of self-enrichment, and is a key 
target for future models of globular cluster formation. 

A natural explanation for the difference between the mass-metallicity relations of glob- 
ular clusters and galaxies is that globular clusters form without extensive mass distributions 
or dark matter halos. Without such halos, globular clusters are unable to retain the mate- 
rial produced by their massive stars, preventing the formation of subsequent generations of 
metal-enriched stars. Such a picture is consistent with the compact, dense nature of globular 
clusters which implies short formation timescales, and with models in which globular cluster 
formation is a rapid, dynamic process in a high pressure starburst en yironment as suggested 



by observations of globular cluster formation in the local universe (lAshman &: Zepj 12001 



Elme green fc Efremovlll997l ). In contrast then, galaxies tend to form over time within larger 
dark-matter dominated structures that help retain metals to be incorporated in subsequent 
generations to produce the observed galaxy mass-metallicity relation. 
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Table 1. Number of Detected Objects 



Cut 


-^clusters 


■'■^inverse 


NuDF 


Matched 


5392 


136 


242 


Completeness > 50% 


3996 


45 


231 


0.5 < F- / < 1.7 


2832 


13 


110 


Ellipticity 


2168 


4 


31 


Surface Brightness 


2090 


2 


31 


Final 


2089 


2 


31 



Table 2. Best fit values: 0.5 magnitude bins/homoscedastic 



{V - I)blue f^blue {V - I)red '^red '^blue T^red Punimodal -^clusters 



19.295 


1, 


.094±0, 


.035 


0, 


.046±0, 


,009 


1 


.227±0, 


,025 


0. 


,046±0, 


.009 


0, 


.460±0. 


,196 


0, 


.540±0, 


,196 


0, 


.312 


15 


19.788 


0, 


.942±0, 


.064 


0, 


.115±0. 


.011 


1 


.215ib0. 


.053 


0. 


,115±0, 


.011 


0, 


.419±0. 


,187 


0, 


.581±0, 


.187 


0, 


.269 


42 


20.274 


0, 


.999±0, 


.019 


0, 


.088±0. 


.009 


1, 


.236±0. 


.023 


0. 


.088±0, 


.009 


0, 


.535±0. 


.082 


0, 


.465±0, 


.082 


0, 


.012 


90 


20.758 


0, 


.990±0, 


.022 


0, 


.095±0. 


.009 


1 


.221±0. 


.017 


0. 


.095±0, 


.009 


0, 


.430±0. 


.067 


0, 


.570±0, 


.067 


0, 


.016 


134 


21.252 


0, 


.994±0, 


.020 


0, 


.092±0. 


.009 


1, 


.250±0. 


.017 


0. 


.092±0, 


.009 


0, 


.432±0. 


.068 


0, 


.568±0, 


.068 


0, 


.000 


170 


21.752 


1, 


.007±0, 


.011 


0, 


.092±0, 


,006 


1 


.268±0, 


,012 


0, 


,092±0, 


.006 


0, 


.415±0, 


,043 


0, 


.585±0, 


.043 





.000 


262 


22.267 


1, 


.002±0, 


.009 


0, 


.086±0, 


,005 


1 


.262±0, 


,008 


0. 


,086±0, 


.005 


0, 


.424±0. 


,035 


0, 


.576±0, 


.035 


0, 


.000 


332 


22.737 


0, 


.974±0, 


.015 


0, 


.106±0. 


.005 


1 


.276±0, 


.011 


0. 


,106±0, 


.005 


0, 


.325±0. 


,036 


0, 


.675±0, 


.036 


0, 


.000 


288 


23.223 


0, 


.982±0, 


.020 


0, 


.112±0. 


.006 


1 


.274±0, 


.012 


0. 


.112±0, 


.006 


0, 


.370±0. 


.042 


0, 


.630±0, 


.042 


0, 


.000 


270 


23.722 


1, 


.004±0, 


.022 


0, 


.112±0. 


.008 


1, 


.293±0, 


.015 


0. 


.112±0, 


.008 


0, 


.384±0. 


.063 


0, 


.616±0, 


.063 


0, 


.000 


189 


24.227 


0, 


.948±0, 


.065 


0, 


.128±0. 


.012 


1 


.288±0, 


.033 


0. 


.128±0, 


.012 


0, 


.262±0. 


.128 


0, 


.738±0, 


.128 


0, 


.059 


55 



Table 3. Best fit values: 1.0 magnitude bins/homoscedastic 



I 


{V - I)uue 


<^blue 


{V - I)red 


^red 






Punimodal 


^clusters 


19.658 


0.888±0.073 


0.111±0.012 


1.175±0.027 


0.111±0.012 


0.198±0.122 


0.802±0.122 


0.074 


57 


20.564 


0.995±0.012 


0.093±0.007 


1.227±0.015 


0.093±0.007 


0.468±0.049 


0.532±0.049 


0.001 


224 


21.555 


l.OOliO.OlO 


0.093±0.005 


1.262±0.009 


0.093±0.005 


0.427±0.029 


0.573±0.029 


0.000 


432 


22.485 


0.991±0.009 


0.097±0.004 


1.269±0.007 


0.097±0.004 


0.372±0.024 


0.628±0.024 


0.000 


620 


23.428 


0.990±0.014 


0.113±0.005 


1.281±0.010 


0.113±0.005 


0.375±0.036 


0.625±0.036 


0.000 


459 
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Table 4. Best fit trends 



Model Ubiue hiue hlue Olblue Cired Wed Ked Oired 



Fixed 0.5 mag 1.184 -0.008 -1.749 0.110 0.908 0.016 1.666 -0.209 

Fixed 1.0 mag 1.029 -0.002 -1.197 0.021 0.824 0.020 1.972 -0.258 

100 point running bins 1.199 -0.009 -1.728 0.114 1.015 0.011 1.231 -0.144 



Table 5. Best fit trends above turnover 



Model auue hlue hlue Oiblue O-red Wed Ked Olred 



Fixed 0.5 mag 1.135 -0.006 -1.538 0.080 0.906 0.016 1.672 -0.210 

Fixed 1.0 mag 1.009 -0.001 -1.118 0.009 0.733 0.024 2.338 -0.314 

100 point running bins 0.928 0.004 -0.596 -0.054 0.794 0.021 2.118 -0.278 
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Fig. 1. — Histograms of the cluster colors in three bins of magnitude, each containing 
632 clusters, along with the best fitting KMM component models for the red and blue 
populations as well as the sum of these two components (shown in green). The top panel 
shows the brightest clusters, with / < 21.94. The middle panel shows fainter clusters, with 
21.94 < I < 23.06. The bottom panel shows the faintest clusters. The constancy of the 
peaks in the red and blue populations indicate the absence of any strong tilts in the M87 
globular cluster system. 
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Fig. 2. — Color trends with I magnitude for the globular clusters in these data, based on 0.5 
magnitude width bins with homoscedastic fits. The errors are calculated via bootstrapping. 
The best fitting trends show effectively no blue tilt {Z oc ]Vf0-08±0-05~)^ g^^^ jg significantly 
different than the previously published trend Z oc M°'^^ shown as a dotted line. For reference, 
the median 90% completeness line is shown as a dot-dashed line. 
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Fig. 3. — Color magnitude diagram of a set of simulated clusters with a mass-metallicity 
trend defined to be Z oc M°-^^ (left). As shown by the best fitting line (Z oc M°-^^^^-^^), 
if the clusters truly had such a trend, it would be easily identifiable in our data. The right 
panel shows the results of fitting simulated clusters with no mass-metallicity trend at all. 
The best fitting line for this data is also shown {Z oc M^'^^^^'^^), which falls within the 
uncertainty of our fits. 



